2 research outputs found
Massively Scalable Inverse Reinforcement Learning in Google Maps
Optimizing for humans' latent preferences is a grand challenge in route
recommendation, where globally-scalable solutions remain an open problem.
Although past work created increasingly general solutions for the application
of inverse reinforcement learning (IRL), these have not been successfully
scaled to world-sized MDPs, large datasets, and highly parameterized models;
respectively hundreds of millions of states, trajectories, and parameters. In
this work, we surpass previous limitations through a series of advancements
focused on graph compression, parallelization, and problem initialization based
on dominant eigenvectors. We introduce Receding Horizon Inverse Planning
(RHIP), which generalizes existing work and enables control of key performance
trade-offs via its planning horizon. Our policy achieves a 16-24% improvement
in global route quality, and, to our knowledge, represents the largest instance
of IRL in a real-world setting to date. Our results show critical benefits to
more sustainable modes of transportation (e.g. two-wheelers), where factors
beyond journey time (e.g. route safety) play a substantial role. We conclude
with ablations of key components, negative results on state-of-the-art
eigenvalue solvers, and identify future opportunities to improve scalability
via IRL-specific batching strategies